<!-- reveal_plugins: ["notes"] -->
sessions %>% select(-details)# A tibble: 100 x 7
session spins hits wager payout hit_rate rtp
<int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 7 2 10 3 0.2857143 0.3000000
2 2 19 7 30 29 0.3684211 0.9666667
3 3 19 3 22 3 0.1578947 0.1363636
4 4 26 7 30 13 0.2692308 0.4333333
5 5 23 8 31 35 0.3478261 1.1290323
6 6 20 8 26 12 0.4000000 0.4615385
7 7 22 5 30 20 0.2272727 0.6666667
8 8 22 4 25 10 0.1818182 0.4000000
9 9 18 4 26 6 0.2222222 0.2307692
10 10 26 8 33 75 0.3076923 2.2727273
# ... with 90 more rows
sessions %>% select(session, details) %>% unnest()# A tibble: 1,972 x 4
session spin wager payout
<int> <int> <int> <dbl>
1 1 1 1 1
2 1 2 1 0
3 1 3 1 0
4 1 4 3 0
5 1 5 2 0
6 1 6 1 2
7 1 7 1 0
8 2 1 2 0
9 2 2 2 0
10 2 3 1 1
# ... with 1,962 more rows
Probability distribution for binomial process:
\[ P(k | n, \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n - k} \]
where
\[ \text{hit rate} = \theta^* = \frac{\text{hits}}{\text{spins}} \]
with(sessions, sum(hits) / sum(spins))[1] 0.3128803
# A tibble: 1 x 2
session_avg session_std
<dbl> <dbl>
1 0.3100018 0.1030164
95% confidence interval () extends from 28.9% to 33.1%.
For session \(i\) there were \(k_i\) hits from \(n_i\) spins.
Assuming that sessions are independent:
\[ P(k | n, \theta) = \prod_i P(k_i | n_i, \theta) = L(\theta; n, k) \]
Log-likelihood for binomial process (multiple trials):
\[ LL(\theta; n, k) = \sum_i k_i \log{\theta} + (n_i - k_i) \log{(1 - \theta)} \]
# theta - probability [single value]
# k - number of successes [vector]
# n - number of trials [vector]
#
binom_log_likelihood <- function(theta, k, n) {
sum(k * log(theta) + (n - k) * log(1 - theta))
}parameters %>% arrange(desc(log_likelihood)) %>% head()# A tibble: 6 x 2
theta log_likelihood
<dbl> <dbl>
1 0.32 -1225.604
2 0.30 -1226.146
3 0.34 -1228.649
4 0.28 -1230.543
5 0.36 -1235.078
6 0.26 -1239.142
Bayes’ Theorem \[ p(\theta|y, X) = \frac{p(y|X, \theta) p(\theta)}{p(y)} \propto p(y|\theta) p(\theta) \] where
… the theory of inverse probability is founded upon error and must be wholly rejected. Sir Ronald Fisher (1925)
Generate independent samples of \(\theta^{(i)}\).
To do this we need to have \(p(\theta^{(m)} | y, X)\).
Generate a series of samples: \(\theta^1\), \(\theta^2\), \(\theta^3\), … \(\theta^M\) where \(\theta^m\) depends on \(\theta^{m-1}\).
To do this we need to have \(p(\theta^{m} | \theta^{m-1}, y, X)\).
Randomly sample \(\theta^{(0)}\).
Set \(i = 1\).
\[ \theta^{(i)} = \left\{\begin{array}{ll} \theta^{*} & \text{if } u \cdot p(\theta^{i-1}|y, X) < p(\theta^{*}|y, X) \\ \theta^{(i-1)} & \text{otherwise.} \end{array}\right. \]
To use rstan you need a .stan and a .R .
data {
int<lower=0> N;
int hits[N];
int spins[N];
}
parameters {
real theta;
}
model {
hits ~ binomial(spins, theta); // Likelihood
theta ~ uniform(0, 1); // Prior
}
library(rstan)
# Use all available cores.
#
options(mc.cores = parallel::detectCores())
trials <- list(
N = nrow(sessions),
hits = sessions$hits,
spins = sessions$spins
)
fit <- stan(
file = "binomial-no-prior.stan",
data = trials,
chains = 2, # Number of Markov chains
warmup = 500, # Warmup iterations per chain
iter = 1000 # Total iterations per chain
)class(fit)[1] "stanfit"
attr(,"package")
[1] "rstan"
samples <- as.data.frame(fit)
head(samples) theta lp__
1 0.2973639 -1228.064
2 0.3302369 -1228.237
3 0.2955706 -1228.346
4 0.2978613 -1227.991
5 0.3017639 -1227.505
6 0.3228558 -1227.345
nrow(samples)[1] 1000
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
print(fit, probs = c(0.025, 0.5, 0.975))Inference for Stan model: binomial-no-prior.
2 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=1000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
theta 0.31 0.00 0.01 0.30 0.31 0.33 386 1.01
lp__ -1227.38 0.03 0.61 -1229.18 -1227.14 -1226.91 465 1.01
Samples were drawn using NUTS(diag_e) at Fri May 11 03:42:39 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Use summary() to get information for each chain.
data {
int<lower=0> N;
int hits[N];
int spins[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
hits ~ binomial(spins, theta); // Likelihood
theta ~ beta(2, 2); // Prior
}
data {
int<lower=0> N;
real rtp[N];
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
rtp ~ normal(mu, sigma);
mu ~ beta(2, 2); // Mode = 0.5
}
Inference for Stan model: normal.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 0.77 0.00 0.06 0.65 0.73 0.77 0.81 0.88 2719 1
sigma 0.64 0.00 0.04 0.56 0.61 0.64 0.67 0.74 3265 1
lp__ -7.25 0.02 0.94 -9.68 -7.63 -6.96 -6.58 -6.32 1901 1
Samples were drawn using NUTS(diag_e) at Fri May 11 04:26:55 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The average RTP is around 77%.
| description | symbols | payout |
|---|---|---|
| 0 | ||
| 1x mouse | 🐭 | 1 |
| 1x cat | 🐱 | 2 |
| 2x mouse | 🐭🐭 | 5 |
| 2x cat | 🐱🐱 | 10 |
| 3x mouse | 🐭🐭🐭 | 20 |
| 3x cat | 🐱🐱🐱 | 100 |
data {
int<lower=0> N;
real rtp[N];
}
parameters {
real<lower=0, upper=1> mu1; // Payline 1 -> 1x
real<lower=0, upper=1> mu2; // Payline 2 -> 2x
real<lower=0, upper=1> mu3; // Payline 3 -> 5x
real<lower=0, upper=1> mu4; // Payline 4 -> 10x
real<lower=0, upper=1> mu5; // Payline 5 -> 20x
real<lower=0, upper=1> mu6; // Payline 6 -> 100x
real<lower=0> sigma;
}
transformed parameters {
real<lower=0> pay;
pay = mu1 * 1 + mu2 * 2 + mu3 * 5 + mu4 * 10 + mu5 * 20 + mu6 * 100;
}
model {
rtp ~ lognormal(log(pay) - sigma * sigma / 2, sigma);
//
// The mean of lognormal() is exp(mu + sigma * sigma / 2).
//
mu1 ~ beta(2, 2); // Mode = 0.5
mu2 ~ beta(2, 4); // Mode = 0.25
mu3 ~ beta(2, 5); // Mode = 0.2
mu4 ~ beta(2, 8); // Mode = 0.125
mu5 ~ beta(2, 10); // Mode = 0.1
mu6 ~ beta(2, 16); // Mode = 0.0625
}
summary(fit)$summary mean se_mean sd 2.5% 25% 50% 75% 97.5%
mu1 0.140183104 1.469255e-03 0.089325826 1.879710e-02 7.177827e-02 0.122664752 0.192136367 0.355484201
mu2 0.068228582 6.855675e-04 0.043359095 1.009871e-02 3.522552e-02 0.059594974 0.092294699 0.172075969
mu3 0.028816792 3.014155e-04 0.019063192 3.628413e-03 1.423612e-02 0.025236838 0.039359326 0.074382807
mu4 0.014604142 1.465113e-04 0.009266188 2.047886e-03 7.471645e-03 0.012931411 0.019858629 0.036488600
mu5 0.007310498 7.466952e-05 0.004625688 1.032144e-03 3.817441e-03 0.006498005 0.009835081 0.018626924
mu6 0.001500567 1.496690e-05 0.000946590 1.915782e-04 7.711689e-04 0.001324914 0.002041584 0.003783854
sigma 0.740078933 1.058005e-03 0.057922464 6.352298e-01 7.001666e-01 0.737313210 0.776424743 0.860629274
pay 0.863032278 1.374708e-03 0.079680070 7.286920e-01 8.064539e-01 0.855920287 0.909859834 1.038541926
lp__ -67.997071371 4.994067e-02 2.062849762 -7.287027e+01 -6.912510e+01 -67.655330546 -66.467193435 -65.058522027
n_eff Rhat
mu1 3696.236 0.9991526
mu2 4000.000 0.9995266
mu3 4000.000 0.9995504
mu4 4000.000 1.0004483
mu5 3837.655 1.0004658
mu6 4000.000 1.0003130
sigma 2997.221 1.0006985
pay 3359.532 0.9997769
lp__ 1706.186 1.0028480
The overall RTP is NA.
\[ p(\tilde{y}|y) = \int_\theta p(\tilde{y}|\theta) p(\theta|y) d\theta \]
# trials <- list(
# N = nrow(sessions),
# spins = sessions$spins
# )
#
# fit <- stan(
# file = "poisson.stan",
# data = trials,
# chains = 4,
# warmup = 1000,
# iter = 2000
# )# extract(fit)
# hist(extract(fit)$lambda)# ggplot(sessions, aes(spins, wager)) + geom_jitter()data {
int<lower=0> N;
vector[N] y; // Total wager
vector[N] x; // Number of spins (standardised)
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + beta * x, sigma); // Likelihood
//
alpha ~ normal(0, 10); // Prior (Normal)
beta ~ normal(0, 10); // Prior (Normal)
sigma ~ cauchy(0, 5); // Prior (Cauchy)
}
# summary(fit)library(shinystan) launch_shinystan(fit) # # Diagnose -> PPcheck “Posterior predictive check” (look at distribution of observed data versus replications - do they look similar? “If our model is a good fit then we should be able to use it to generate data that looks a lot like the data we observed.”)